Mapping human lymph node cell types to 10X Visium - estimating reference expression signatures

Open In Colab

Cell2location maps cell types by integrating single cell/nucleus and spatial transcriptomics data. This is achieved by estimating which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, taking technical effects into account (platform/technology effect, contaminating RNA, unexplained variance).

Given cell type annotation for each cell, the corresponding reference cell type signatures $g_{f,g}$, which represent the average mRNA count of each gene $g$ in each cell type $f={1, .., F}$, can be estimated from sc/snRNA-seq data using 2 provided methods (see below). Cell2location needs untransformed unnormalised spatial mRNA counts as input. You also need to provide cell2location with the expected average cell abundance per location which is used as a prior to guide estimation of absolute cell abundance. This value depends on the tissue and can be estimated by counting nuclei for a few locations in the paired histology image but can be approximate (see paper methods for more guidance).

We provide 2 methods for estimating reference expression signatures of cell types from scRNA-seq data:

1) a statistical method based on Negative Binomial regression. We generally recommend using NB regression, which allows to robustly combine data across technologies and batches, which results in improved spatial mapping accuracy. This notebook shows use a dataset composed on multiple batches and technologies to estimate that.

2) hard-coded computation of per-cluster average mRNA counts for individual genes (scvi.external.cell2location.compute_cluster_averages). When the batch effects are small, this faster hard-coded method of computing per cluster averages provides similarly high accuracy. We also recommend the hard-coded method for non-UMI technologies such as Smart-Seq 2.

Contents:

Loading packages

Loading Visium data

First let's read spatial Visium data from 10X Space Ranger output.

Load reference cell type expression signatures

The signatures were estimated from scRNA-seq data, accounting for batch effect, using a separate model, as shown here: https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_estimating_signatures.html

Here we download the h5ad object with results:

import os if not os.path.exists('./data/sc.h5ad'): !cd ./data/ && wget https://cell2location.cog.sanger.ac.uk/paper/integrated_lymphoid_organ_scrna/RegressionNBV4Torch_57covariates_73260cells_10237genes/sc.h5ad # read data reg_mod_name = 'RegressionNBV2Torch_65covariates_40532cells_12819genes' results_folder2 = '/nfs/team205/vk7/sanger_projects/cell2location_paper/notebooks/results/mouse_viseum_snrna/' reg_path = f'{results_folder2}regression_model/{reg_mod_name}/' adata_snrna_raw = sc.read(f'{reg_path}sc.h5ad') adata_snrna_raw_copy = adata_snrna_raw.copy() adata_snrna_raw_copy.var_names = adata_snrna_raw_copy.var['SYMBOL'].astype(str) adata_snrna_raw_copy.var_names_make_unique() adata_snrna_raw.var['SYMBOL'] = adata_snrna_raw_copy.var_names #adata_snrna_raw.X = adata_snrna_raw.raw.X inf_aver = adata_snrna_raw.var.copy() inf_aver = inf_aver.loc[:, [f'mean_cov_effect_annotation_1_{i}' for i in adata_snrna_raw.obs['annotation_1'].unique()]] from re import sub inf_aver.columns = [sub(f'mean_cov_effect_annotation_1_{i}', '', i) for i in adata_snrna_raw.obs['annotation_1'].unique()] inf_aver = inf_aver.iloc[:, inf_aver.columns.argsort()] adata_snrna_raw_copy = adata_snrna_raw.copy() adata_snrna_raw_copy.var_names = adata_snrna_raw_copy.var['SYMBOL'].astype(str) adata_snrna_raw_copy.var_names_make_unique() adata_snrna_raw.var['SYMBOL'] = adata_snrna_raw_copy.var_names inf_aver.index = adata_snrna_raw.var.loc[inf_aver.index, 'SYMBOL'] # scale up by average sample scaling factor inf_aver = inf_aver * adata_snrna_raw.uns['regression_mod']['post_sample_means']['sample_scaling'].mean()

Train scvi-cell2location

import torch torch.cuda.set_device(0)
import os os.mkdir(f"{scvi_run_name}_c2l_amortised")

Plot cell abundance in spatial coordinates

Perform clustering of cell abudance estimates to identify tissue regions

We find regions by clustering locations/spots (Leiden) based on estimated cell abundance of each cell type. Results are saved in adata_vis.obs['region_cluster'].

Advanced use examples

# Get posterior distribution samples for specific variables samples_w_sf = mod.sample_posterior(num_samples=1000, use_gpu=True, return_samples=True, batch_size=2020, return_sites=['w_sf', 'm_g', 'u_sf_mRNA_factors']) # samples_w_sf['posterior_samples'] contains 1000 samples as arrays with dim=(num_samples, ...) samples_w_sf['posterior_samples']['w_sf'].shape# Compute any quantile of the posterior distribution medians = mod.posterior_quantile(q=0.5, use_gpu = True) with mpl.rc_context({'axes.facecolor': 'white', 'figure.figsize': [5, 5]}): plt.scatter(medians['w_sf'].flatten(), mod.samples['post_sample_means']['w_sf'].flatten()); plt.xlabel('median'); plt.ylabel('mean');

Modules and their versions used for this analysis

from session_info import session_info session_info()